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1  Introduction 


This  report  summarizes  the  research  results  performed  from  July  1,  1994 
through  August  31,  1997  under  the  project  entitled  “Pattern  Recognition 
and  Image  Analysis  Extensions  to  the  IE2000  EPToolkit,”  Grant  No.  F30602- 
94-1-0010,  funded  by  the  United  States  Air  Force.  The  research  effort  was 
performed  at  the  University  of  Vermont  by  Prof.  Robert  R.  Snapp  and  seven 
graduate  students  under  his  supervision.  The  work  resulted  in  one  jour¬ 
nal  publication  in  the  Annals  of  Statistics,  six  conference  papers,  a  40  page 
technical  report,  an  X  Windows  software  package,  and  the  development  of  a 
novel  algorithm  that  efficiently  approximates  a  k-nearest  neighbor  classifier 
in  low  dimensional  feature  spaces.  The  quality  of  this  work  is  reflected  by 
the  high  reviewing  standards  of  the  journals  and  conferences  used  to  com¬ 
municate  these  results,  and  the  award  of  a  competitive  research  grant  from 
the  U.  S.  Army  Research  Office  to  continue  this  work1,  and  invitations  to 
present  talks  describing  some  of  these  results  at  colloquia  at  Cambridge  Uni¬ 
versity  (Cambridge,  England,  October  1994),  Concordia  University  (Montreal, 
Canada,  May  1995),  Rensselaer  Polytechnic  Institute  (Troy,  NY,  April  1995), 
SUNY  Buffalo  (October  1995),  and  Siemens  Corporate  Research,  (Princeton, 
NJ,  October  1995).  These  research  results  are  described  in  greater  detail  in 
the  following  sections,  and  complete  copies  of  all  publications  are  contained 
in  Section  6. 

Section  2  describes  the  theoretical  results  derived,  including  a  finite-sample 
analysis  of  the  k-nearest  neighbor  classifier  under  different  metrics,  analytic 
support  for  the  asymptotic  optimality  of  a  weighted  Euclidean  metric.  Sec¬ 
tion  3  describes  two  algorithms  inspired  by  the  theory:  (z)  a  strategy  for  esti¬ 
mating  the  Bayes  risk  of  a  practical  pattern  classification  from  a  set  of  classi¬ 
fied  patterns;  and  (ii)  an  efficient  implementation  of  the  k-nearest  neighbor 
classifier,  called  the  labeled  cell  classifier.  Section  4  describes  an  X  Windows 
program  that  incorporates  a  graphical  tool  for  building  labeled  and  unlabeled 
reference  sets  from  multispectral  images  interactively,  and  includes  an  effi¬ 
cient  implementation  of  the  k-nearest  neighbor  classifier  for  classifying  other 
pixels  in  the  same,  or  related  images.  Section  5  identifies  the  students  who 
participated  in  this  research,  and  were  supported  on  this  grant.  Section  6  de¬ 
scribes  the  publications  that  grew  out  of  this  project.  Section  7  summarizes 

1  “Finite  Sample  Analyses  of  Nearest  Neighbor  Algorithms,”  U.  S.  Army  Research  Office, 
DAAG55-98-1-0022 
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and  discusses  the  practical  relevance  of  our  work  to  image  exploitation. 


2  Theoretical  Results 

The  most  significant  results  of  the  research  were  two  theoretical  discover¬ 
ies  related  to  a  finite-sample  analysis  of  the  k-nearest  neighbor  classifier [12], 
one  of  the  most  popular  pattern  recognition  algorithms  in  use  today.  In  this 
context,  we  assume  that  each  pattern  is  a  vector,  constructed  from  a  finite 
number  of  measurements,  or  features  [10].  As  a  simple  example,  each  pixel 
in  a  multispectral  image  can  be  represented  as  a  pattern  using  the  intensities 
of  the  spectral  bands  as  features.  As  a  more  general  example,  each  pixel  can 
be  represented  using  the  intensities  obtained  from  an  array  of  image  pro¬ 
cessing  filters  that  are  centered  about  that  pixel  (e.g.,  edge,  texture,  or  shape 
detectors).  In  this  way,  salient  information  about  the  values  of  the  neigh¬ 
boring  image  pixels  can  be  incorporated  within  a  pattern.  The  number  of 
features  (e.g.,  the  sum  of  the  number  spectral  bands  and  the  number  of  fil¬ 
ters)  used  to  represent  each  pattern  is  called  the  dimensionality  of  the  feature 
space.  In  order  for  this  scheme  to  be  useful,  features  should  be  selected  so 
that  patterns  originating  from  distinct  states  of  nature,  or  classes,  are  more 
or  less  distinguishable.  As  it  is  rarely  possible  in  practice  to  analytically  de¬ 
scribe  how  patterns  of  a  given  class  are  generated,  almost  every  classification 
method  is  based  on  the  information  contained  in  a  training  set  of  correctly 
labeled  patterns,  or  reference  sample,  that  is  a  set  of  feature  vectors,  each 
labeled  by  its  true  class. 

Given  a  reference  sample  (or  “training  set”)  of  m  labeled  (i.e.,  classified) 
feature  vectors  the  k  nearest  neighbor  classifier  assigns  an  input  pattern  x 
to  a  class  by  identifying  the  subset  of  k  feature  vectors  from  the  reference 
sample  that  are  closest  to  x  using  a  predefined  distance  function  (or  metric). 
The  input  pattern  is  then  assigned  to  the  class  that  appears  most  frequently 
within  the  subset  of  k  nearest  neighbors. 

Despite  its  simplicity,  this  algorithm  has  been  shown  theoretically  to  be 
as  accurate  as  a  Bayes  classifier  (the  most  accurate  pattern  classifier  possible) 
in  the  limit  of  an  infinite  sample  size  [35].  Fortunately,  this  limit  converges 
rapidly  for  many  practical  problems,  which,  along  with  its  ease  of  use,  is  why 
it  so  popular  among  practitioners.  In  image  exploitation,  nearest  neighbor 
methods  can  be  used  to  compare  the  accuracy  of  different  feature  repre¬ 
sentations  for  a  given  classification  problem.  Because  there  is  no  extensive 
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training  phase,  k  nearest  neighbor  classifiers  can  be  quickly  constructed  and 
put  on  line,  as  new  application  needs  arise. 


2.1  Finite-Sample  Analysis 

The  research  that  we  performed  provides  a  quantitative  description  on  how 
this  limit  is  achieved,  and  enables  an  improved  understanding  of  this  clas¬ 
sifier’s  performance  using  finite  reference  samples.  This  work  extends  the 
classic  results  of  Cover  and  Hart  [5, 4],  and  corrects  the  recent  work  of  Fuku- 
naga  and  Hummels  [19].  Specifically,  in  a  series  of  papers  [28,  29,  30,  31]  we 
showed  that  for  classification  problems  that  possess  a  certain  degree  of  reg¬ 
ularity,  the  probability  of  error  of  the  k  nearest  neighbor  classifier  Pm  (error) 
can  be  accurately  estimated  from  an  asymptotic  series  of  the  form 

oo 

Pm (error)  =  Co  +  ]T  c/m~J/w.  (1) 

j= 2 

Here,  c0  =  Poo  (error)  denotes  the  expression  derived  by  Cover  and  Hart  [5] 
for  the  probability  of  error  of  the  k  nearest  neighbor  classifier  in  the  infinite 
sample  limit;  m  denotes  the  number  of  labeled  patterns  in  the  reference 
sample,  and  n,  the  dimensionality  of  each  pattern.  As  (1)  is  an  asymptotic 
expansion  in  the  sense  of  Poincare  [11],  it  can  be  truncated  at  any  point, 
resulting  in  an  error  with  magnitude  of  the  first  neglected  term.  We  also 
have  obtained  analytic  expressions  for  the  leading  expansion  coefficients,  Cj, 
in  the  summation  of  Eqn.  (1)  in  terms  of  the  probability  distributions  that 
define  the  pattern  recognition  problem,  the  value  of  k,  and  the  metric  used. 

Eqn.  (1)  is  significant  for  the  following  reasons: 

•  If  the  probability  distributions  that  describe  each  pattern  class  are  known, 
an  AT-th  order  truncation  of  (1)  can  be  used  to  predict  the  finite  sample 
accuracy  of  a  k  nearest  neighbor  classifiers,  as  values  of  the  expansion 
coefficients  P«  (error),  c2,  c3> . . . ,  cN  can  be  evaluated  numerically  from 
the  expressions  published  in  [31]. 

•  If  the  probability  distributions  that  describe  each  pattern  class  are  not 
known,  as  is  the  case  in  nearly  every  pattern  recognition  problem  of 
practical  interest,  then  an  N-th  order  truncation  of  (1)  can  be  used  to 
predict  the  finite  sample  accuracy  of  a  k  nearest  neighbor  classifier, 
as  values  of  the  expansion  coefficients  PM (error),  c2,  c3> . . . ,  Cn  can  be 
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estimated  statistically  using  standard  resampling  methods  [27,  33].  In 
particular,  (1)  can  be  used  to  estimate  the  practical  benefit  of  acquir¬ 
ing  more  reference  data,  and  thus  is  useful  for  designing  k  nearest 
neighbor  classifiers.  (Section  3.1  describes  an  extension  of  this  idea  for 
estimating  the  Bayes  risk  from  a  labeled  reference  sample.) 

•  Eqn.  (1),  and  the  analytic  form  of  the  leading  coefficients,  provide  use¬ 
ful  fundamental  insights  about  this  algorithm.  For  example,  the  factor 
of  m~2/n  in  the  second  term  is  an  analytic  validation  of  the  curse  of 
dimensionality.  Similarly  an  analysis  of  coefficient  c2,  described  be¬ 
low,  demonstrates  that  a  weighted  Euclidean  metric  is  asymptotically 
optimal  for  the  class  of  problems  considered  by  this  analysis. 

2.2  Asymptotic  optimality  of  the  Euclidean  metric 

A  pressing  issue  in  the  realm  of  applied  pattern  recognition  is  how  does 
one  design  a  pattern  classifier  for  a  given  problem.  In  the  context  of  the 
k  nearest  neighbor  algorithm,  one  might  ask  what  metric  yields  the  most 
accurate  classifier.  This  is  an  open  problem,  and  generally  depends  upon 
specifics  of  the  problem.  Nevertheless,  for  the  class  of  sufficiently  smooth 
problems  our  work  demonstrates  that  for  sufficiently  large  sample  size,  a 
weighted  Euclidean  metric  is  the  optimal  global  Lp  metric  [30,  31]. 

To  show  this,  we  considered  a  broad  class  of  global  metrics  derived  from 
the  standard  Lp  norm: 

f  71*1  |P  +  •  •  •  +  |xnlp  :  ifl<p<°0, 

^p  =  I  max  \Xi\  :  if  p  =  °°, 

L  1  zizn 

and  assumed  the  general  global  metric 

d(x,y)  =  IIA(x-y)  ||p 

where  A  is  an  arbitrary  nonsingular  n-by-n  matrix,  and  p  is  chosen  from 
the  interval  1  <  p  <  oo.  Under  these  assumptions,  we  showed  that  Eqn.  (1) 
converges  uniformly  with  respect  to  values  of  A  and  p.  Since  the  leading 
coefficient  Co  does  not  depend  on  these  values,  the  optimal  asymptotic  met¬ 
ric  can  be  found  by  finding  the  values  of  A  and  p  that  minimize  the  next 
most  significant  coefficient,  namely  c2.  Surprisingly,  the  optimal  value  of  p 
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equals  2,  independent  of  the  specifics  of  the  pattern  recognition  problem.  We 
also  obtained  an  expression  for  the  optimal  weight  matrix  A  in  terms  of  the 
probability  distributions  that  define  the  given  pattern  recognition  problem. 
Numerical  simulations  were  also  used  to  demonstrate  the  practical  signifi¬ 
cance  of  these  findings  [31]. 


3  Algorithms  Developed 

Two  promising  algorithms  were  developed  during  the  course  of  this  project. 
The  first  directly  stems  from  the  theoretical  analysis  described  in  the  previ¬ 
ous  section  for  estimating  the  Bayes  risk  of  a  practical  pattern  recognition 
problem  from  real  data  [27,  33].  The  second  algorithm,  the  labeled  cell  clas¬ 
sifier,  is  a  computationally  efficient  approximation  to  a  k  nearest  neighbor 
classifier  [24].  Both  algorithms  are  described  below. 

3.1  Estimating  the  Bayes  risk 

Given  a  pattern  classification  problem,  a  Bayes  classifier  is  defined  to  be  a 
pattern  classifier  that  minimizes  the  probability  of  error  (or  in  more  general 
terms  the  statistical  risk,  as  some  misclassifications  may  incur  a  greater  cost 
than  others).  Computationally,  a  Bayes  classifier  assigns  each  input  pattern 
to  the  class  that  has  the  maximum  posterior  probability  [10].  The  probability 
of  error  of  such  a  classifier  is  called  the  Bayes  risk ;  we  shall  denote  its  value 
by  Rb. 

In  practice  the  construction  of  a  Bayes  classifier  generally  requires  knowl¬ 
edge  of  the  probability  distributions  that  define  the  given  pattern  classifica¬ 
tion  problem.  Unfortunately,  this  information  is  usually  not  available  for 
problems  of  practical  interest.  Nevertheless,  our  research  demonstrates  that 
accurate  estimates  of  the  Bayes  risk  can  be  obtained  from  a  sufficiently  large 
reference  sample  of  labeled  feature  vectors.  Estimates  of  the  Bayes  risk  can 
facilitate  the  design  of  better  classifiers.  For  example,  since  the  value  of  Rb 
depends  upon  the  set  of  features  chosen  to  represent  each  pattern,  one  might 
compare  estimates  of  the  Bayes  risk  for  a  number  of  different  feature  sets, 
and  then  select  the  representation  that  yields  the  smallest  value.  Several 
previous  efforts  have  utilized  the  k  nearest  neighbor  classifier  towards  this 
end  [6,  20,  21,  23].  Our  work  reported  in  references  [27,  33]  takes  advantage 
of  Eqn.  (1),  the  most  detailed  parametric  model  available  of  the  accuracy  of 
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this  pattern  classifier  as  a  function  of  the  reference  sample  size  m.  Using  a 
large  pool  of  classified  data,  one  can  construct  a  sequence  of  k  nearest  neigh¬ 
bor  classifiers  of  varying  sample  sizes.  Using  standard  least  squares  meth¬ 
ods,  one  can  obtain  estimates  for  the  unknown  coefficients  c0,c2 . cN.  Of 

greatest  utility  is  the  estimated  value  of  Co  which  can  be  used  to  place  upper 
and  lower  bounds  on  the  value  of  the  Bayes  risk  Rb ■  To  enable  an  estimate  of 
Rb  with  precision  e,  choose  k  >  2/e2,  and  estimate  Co  by  the  above  method. 
After  inverting  an  inequality  derived  by  Devroye  [7]  one  obtains 


c0  -  e  <  <  cq. 


The  practical  utility  of  this  method  was  demonstrated  by  two  experi¬ 
ments,  one  using  synthetic  data,  where  the  true  Bayes  risk  was  known,  and 
the  other  using  a  pattern  classification  problem  using  imagery  obtained  from 
the  IE2000  group  at  Rome  Laboratory  (See  Fig.  1). 

3.2  Labeled  cell  classifier 

Although  accurate  and  easy  to  adapt  to  new  classification  problems,  the  time 
that  a  k  nearest  neighbor  classifier  needs  to  classify  an  input  pattern  in¬ 
creases  with  the  size  of  the  reference  sample.  Over  the  years,  different  tech¬ 
niques  have  been  developed  to  simplify  the  search  for  the  k  nearest  neighbors 
in  a  reference  sample.  (That  the  Sixth  DMACS  Implementation  Challenge  for 
1998  centers  on  this  task,  demonstrates  that  this  remains  a  ripe  problem  [9].) 
Most  approaches  fall  within  two  broad  categories:  (i)  exact  implementations 
that  restrict  the  search  for  nearest  neighbors,  by  organizing  the  data  in  a  hier- 
archal  manner  [13, 15, 17, 14],  or  (ii)  approximate  implementations  that  edit 
the  reference  sample  [1,  16,  36].  In  our  study  [24],  we  contructed  a  hybrid 
approach. 

3.2.1  k-d  trees 

The  labeled  cell  algorithm,  described  in  Section  3.2.2,  is  based  on  the  imple¬ 
mentation  of  Friedman,  Bentley,  and  Finkel[14]  that  organizes  the  reference 
sample  Xm  into  an  n-dimensional  binary  tree,  such  that  the  root  node  rep¬ 
resents  the  entire  feature  space,  and  each  node  in  the  tree  represents  an 
isothetic  cell  that  contains  a  subset  of  Xm.  The  two  descendants  of  each 
nonterminal  node  divide  the  parent  cell  along  one  coordinate,  called  the  key, 
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Figure  1:  A  fourth-order  (N  -  4)  fit  of  Eqn.  (1)  to  33  empirical  estimates  of  Rmi 
for  a  pixel  classification  problem  obtained  from  a  multispectral  satellite  im¬ 
age.  Patterns  were  constructed  using  five  spectral  components  of  each  image 
pixel.  Using =  0.0758,  the  fourth-order  fit,  Rm  =  0.0758  +  0.124ra_2/s  + 
0.0133m~4/5,  is  plotted  as  a  solid  curve  on  a  log-log  scale  to  reveal  the  sig¬ 
nificance  of  the  j  -  2  term. 

such  that  the  number  of  reference  patterns  in  each  child  cell  differs  at  most 
by  one.  The  key  may  be  the  coordinate  of  greatest  variation  of  the  refer¬ 
ence  vectors  in  the  parent  cell,  and  the  threshold  may  be  the  median  of  their 
projections  along  the  chosen  coordinate.  Pairs  of  descendants  are  added  re¬ 
cursively  until  the  number  of  vectors  in  a  cell  does  not  exceed  a  bucket  size 
b.  Note  that  the  nodes  at  a  constant  depth  represent  a  partition  of  the  fea¬ 
ture  space,  as  do  the  leaf  nodes.  Fig.  2  displays  a  k-d  tree  constructed  from 
a  reference  sample  of  16  points  in  Ift2,  with  b  =  2. 

After  the  tree  is  completed,  the  k  feature  vectors  in  the  tree  that  are  near¬ 
est  to  a  given  input  pattern  x  can  be  identified.  A  priority  queue  is  used  to 
maintain  the  k  feature  vectors  encountered  so  far  that  are  closest  to  x.  Be¬ 
ginning  with  the  root,  nodes  in  the  tree  are  examined  recursively  until  it  is 
certain  that  the  k  nearest  neighbors  have  been  found.  If  the  current  node 
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Figure  2:  (a)  A  k-d  tree  of  depth  three  is  constructed  from  this  set  of  sixteen 
feature  vectors  in  !R2  that  forms  the  root  of  the  tree,  (b)  The  set  is  bisected 
into  left  and  right  portions,  forming  the  two  descendent  nodes  of  the  root, 
as  the  largest  variation  appears  along  the  horizontal  coordinate,  (c)  Each  re¬ 
sulting  subset  is  further  divided  into  two  equal  partitions  along  the  vertical 
coordinate,  forming  the  four  nodes  at  depth  two  in  the  tree,  (d)  Each  result¬ 
ing  subset  is  then  divided  along  the  coordinate  of  greatest  variation.  Each 
restating  cell,  labeled  with  a  greek  letter,  contains  two  feature  vectors,  and 
forms  a  leaf  node  of  the  k-d  tree  (e). 

is  a  leaf  node,  then  the  priority  queue  is  updated  after  examining  its  b  or 
fewer  feature  vectors.  Otherwise  the  key  i  and  threshold  value  t  of  the  node 
are  examined,  and  the  recursive  procedure  is  applied  first  to  the  descendant 
that  falls  on  the  same  side  of  t  as  xu  and  then  to  its  sibling.  For  efficiency, 
nodes  are  only  examined  if  their  cell  boundaries  are  closer  to  x  than  the  k-th 
nearest  neighbor  found  so  far  (the  bounds-overlap-ball  test);  and  the  search 
is  stopped  as  soon  as  the  k-th  nearest  neighbor  is  closer  to  x  than  the  bound¬ 
aries  of  every  unexamined  cell  (the  ball-within-bounds  test). 


3.2.2  Labeling  the  cells 

The  labeled  cell  algorithm  is  designed  to  reduce  the  number  of  feature  vec¬ 
tors  examined  during  each  classification.  As  in  the  previous  implementation, 
the  reference  sample  is  organized  into  a  multidimensional  binary  search  tree 
using  the  coordinates  of  the  feature  vectors  as  keys.  An  integer  k'  >  k  and 
a  fraction  a  >  0  are  selected.  A  central  test  vector  from  each  leaf  cell  is 
then  classified  with  an  exact  k' -nearest-neighbor  classifier  (e.g.,  the  previous 
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implementation).  This  test  vector  could  be  the  centroid  of  the  leaf  cell  (as¬ 
suming  it  is  compact),  or  the  sample  mean  of  its  reference  vectors.  If  the 
number  of  fc'-nearest-neighbors  that  belong  to  the  most  frequent  class  ex¬ 
ceeds  [<xk'\,  then  the  leaf  cell  is  given  the  label  of  that  class.2  (Otherwise, 
it  remains  unlabeled.)  Nonterminal  nodes  are  examined  recursively:  if  two 
siblings  share  a  common  class  label,  then  their  parent  is  assigned  the  same 
label. 

Input  patterns  are  classified  by  the  fc-d  tree  algorithm,  with  one  important 
exception:  if  an  input  pattern  belongs  to  a  cell  that  is  labeled,  then  it  is 
immediately  assigned  to  the  indicated  class.  Thus  no  reference  vectors  are 
examined  if  an  input  falls  within  a  labeled  cell.  For  different  values  of  a, 
k',  and  k,  the  labeled  cell  algorithm  implements  a  variety  of  classifiers:  «  = 
1  yields  an  exact  fc-nearest  neighbor  classifier,  and  a  <  1/C,  a  pine  cell 
classifier. 

Since  the  classes  assigned  to  patterns  that  fall  within  the  labeled  cells 
may  differ  occasionally  from  the  results  of  the  fc-nearest  neighbor  algorithm. 
Thus,  like  Hart’s  condensed  nearest  neighbor  rule  [16],  the  labeled  cell  clas¬ 
sifier  only  approximates  the  classic  algorithm.  However,  computer  experi¬ 
ments  suggest  that  if  a  classification  needs  to  be  performed  in  a  fixed  amount 
of  time,  then  the  new  algorithm  may  attain  greater  accuracy  than  other  im¬ 
plementations  of  the  fc -nearest-neighbor  classifier,  as  the  computation  saved 
in  the  labeled  cells  allows  this  new  algorithm  to  process  a  larger  reference 
sample. 

3.2.3  Experimental  results 

Two  problems  illustrate  the  differences  in  performance  and  accuracy  be¬ 
tween  labeled  cell  and  exact  fc-d  tree  implementations  of  the  fc- nearest  neigh¬ 
bor  classifier.  The  first,  assumes  two  equally  probable,  normally  distributed 
classes  in  R3.  Thus  the  class-conditional  probability  densities  are 

f„(x)  =  _ -  r-((xi+(-l)*ii)2+x?+xj)/2<r 2 

Jn)  (2tto-2)3'2 

for  £  e  {1,2}.  The  classification  accuracy  (i.e.,  the  expected  probability  of 
error),  and  the  expected  number  of  operations  per  classification  are  empiri- 

2For  simplicity,  we  assume  a  zero-one  loss  matrix,  so  cells  are  labeled  if  their  local 
estimate  of  the  conditional  risk  is  less  than  1  -  a.  It  is  straightforward  to  generalize  the 
algorithm  to  a  asymmetric,  multiclass,  risk  function. 
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Class  Separation  (2 p/a) 

Figure  3:  A  semilogarithmic  plot  obtained  from  a  classification  problem  with 
two  normally  distributed  classes  in  IR3.  The  circular,  triangular,  and  square 
markers  describe  the  average  performance  of  hundreds  of  labeled  cell  clas¬ 
sifiers  with  a  equal  to  0.9,  0.7,  and  0.5  (a  pure  cell  classifier)  respectively.  In 
all  cases  k  =  k'  =  11.  The  five-pointed  stars  describe  the  performance  of  an 
k- d  tree  implementation  of  an  11-nearest-neighbor  classifier.  Vertical  error 
bars  all  he  within  each  marker. 

cally  estimated  from  a  sequence  of  independent  trials.  For  each  trial  a  ran¬ 
dom  reference  sample  of  wi  =  10, 000  patterns  is  used  to  classify  several 
thousand  independent  input  vectors.  The  number  of  operations  is  estimated 
heuristically:  each  comparison  and  addition  count  as  one  operation,  and  each 
multiplication  as  two.  (Qualitatively  similar  results  are  obtained  with  a  vari¬ 
ety  of  weighting  factors.)  Results  for  k  =  11,  a  Euclidean  metric,  and  eight 
values  of  2 p/<r  are  displayed  in  Fig.  3.  In  this  example,  the  greatest  absolute 
deviation  in  accuracy  between  two  implementations  occurs  at  2 p/<r  =  6  and 
ex  =  0.5,  where  the  labeled  cell  classifier  misclassifies  0.15%  of  the  indepen¬ 
dent  test  patterns,  and  the  fc-d  tree  implementation  misclassifies  0.14%.  Note 
in  particular,  how  the  recursive  labeling  scheme  accelerates  the  performance 
as  the  class  separation  is  increased,  with  little  degradation  in  accuracy. 

The  second  problem,  uses  data  extracted  from  a  seven-band  digital  image. 
We  let  each  pixel  define  an  independent  pattern.  The  first  band  is  quantized 
about  the  median  to  obtain  a  binary  class  label.  A  six-dimensional  feature 
vector  is  formed  with  the  remaining  spectral  bands.  Reference  and  test  pat¬ 
terns  are  selected  independently  from  the  image.  Fig.  4  displays  the  trade-off 
between  the  classification  accuracy  and  the  computational  cost  for  four  dif¬ 
ferent  reference  sample  sizes  as  well  as  four  different  values  of  <x.  These 
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Figure  4:  Results  of  the  second  experiment  in  which  six-dimensional  pixels, 
belonging  to  two  different  classes  were  classified  by  three  different  labeled 
cell  classifiers  (k'  =  k  =  7),  and  a  k-d  tree  implementation  of  a  7-nearest- 
neighbor  classifier.  The  reference  sample  size  appears  to  the  right  of  each 
marker  in  thousands.  The  horizontal  axis  is  logarithmic. 

results  suggest  that  the  recursive  labeling  scheme  accelerates  classification 
with  only  a  small  reduction  in  accuracy.  Note  that  by  increasing  the  size  of 
the  reference  sample,  it  is  possible  to  obtain  a  labeled  cell  classifier  that  is 
both  significantly  faster  and  more  accurate  than  a  k-d  tree  classifier.  Thus  the 
new  algorithm  may  be  useful  for  real-time  applications  that  provide  an  abun¬ 
dant  supply  of  classified  data.  The  estimates,  redisplayed  in  Fig.  5,  validate 
that  the  average  classification  time  of  labeled  cell  classifiers  is  also  O  (log  m) , 
but  with  smaller  constants  of  proportionality  /?.  Preliminary  comparative  ex¬ 
periments  suggest  that  the  labeled  cell  classifier  is  competitive  with  other 
approximations  of  the  k  nearest  neighbor  algorithm.  Moreover,  recursive  la¬ 
beling  can  be  combined  with  early  truncation  (Arya  and  Mount  [1])  to  yield 
even  faster  implementations. 

These  simulations  suggest  that  the  labeled  cell  classifier  is  most  useful 
for  problems  that  provide  an  abundant  supply  of  classified  patterns,  are  de¬ 
scribed  by  smooth  probability  distributions,  and  have  a  small  Bayes  risk  (e.g., 
pixel  classification  of  satellite  images). 
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Figure  5:  Empirical  estimates  of  the  average  number  of  operations  required 
for  each  classification  as  a  function  of  the  sample  size  m  for  the  second  exper¬ 
iment.  The  linear  graphs  represent  least-square  fits  of  the  form  /?  log10  m  +  y. 
(Note  that  the  horizontal  axis  is  logarithmic.) 

4  Software  Production 

The  most  successful  software  to  emerge  from  this  project  is  a  stand-alone 
X  Windows  application  called  pstool  that  enables  a  user  to  interactively  con¬ 
struct  a  labeled  reference  sample  from  a  multispectral  digital  image  (in  either 
LANDSAT-TM  or  TIFF  format)  and  classify  other  regions  of  the  image  using  a 
fc-nearest  neighbor  classifier.  This  application  was  brought  to  Rome  Labs  for 
a  demo  in  August  1995,  and  a  revised  version  was  placed  on  an  FTP  server 
in  the  Spring  of  1996.  The  program  was  also  used  by  an  image  processing 
group  at  Rensselaer  Polytechnic  Institute  in  Troy,  New  York.  The  program 
was  written  in  C  in  a  modular  fashion,  using  updated  IPToolkit  modules. 

The  program  proved  to  be  useful  for  our  following  experimental  research 
in  Bayes  risk  estimation,  and  in  designing  faster  implementations  of  k  nearest 
neighbor  classifiers. 

Several  graduate  students  also  contributed  software  to  the  project,  includ¬ 
ing  a  C++  class  library  of  three  different  neural  network  training  algorithms, 
an  implementation  of  the  time-difference  reinforcement  learning  algorithm, 
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and  algorithms  for  detecting  roads  in  digital  images. 


5  Students  Supervised 

This  grant  help  further  the  education  and  professional  training  of  seven  grad¬ 
uate  students  at  the  University  of  Vermont,  four  of  whom  received  Master  of 
Science  degrees,  and  one  received  a  Ph.D.  Students  benefited  from  research 
assistantships  awarded  during  the  summer  and  for  Mr.  Yong  Feng,  during 
the  academic  year.  Their  names  are  listed  below,  along  with  the  degrees  they 
received. 

•  Mr.  Tong  Xu,  M.S.  in  Electrical  Engineering,  1995. 

•  Dr.  Alessandro  Palau,  Ph.D.  in  Electrical  Engineering,  1997. 

•  Mr.  Xianguan  Li,  M.S.  in  Electrical  Engineering,  1997. 

•  Mr.  Yong  Feng,  M.S.  in  Computer  Science,  1997. 

•  Mr.  Chaoyu  Jin,  M.S.  in  Electrical  Engineering,  1997. 

•  Mr.  Qing  Ye,  graduate  student  in  Computer  Science 

•  Mr.  Shawn  Ma,  graduate  student  in  Computer  Science 


6  List  of  Publications 

The  most  significant  results  of  this  research  project  appear  in  seven  papers: 
six  were  accepted  by  peer  reviewed  conferences,  and  one,  by  the  Annals  of 
Statistics,  the  flagship  and  stringently  reviewed  journal  of  the  Institute  of 
Mathematical  Statistics.  Copies  of  these  papers  appear  in  the  Appendix  of 
this  report.  An  eighth  paper,  with  Alessandro  M.  Palau  on  the  labeled  cell 
classifier,  is  in  progress. 
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sample  risk  of  the  k-nearest-neighbor  classifier,”  Proceedings  of  the 
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neighbor  classifier  under  the  Lp  metric,”  Proceedings  of  the  1994  IEEE- 
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drecht,  Netherlands,  1996,  pp.  295-302. 
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Metric”  Proceedings  of  the  1995  IEEE  International  Symposium  on  In¬ 
formation  Theory,  (Whistler,  Canada),  September  1995. 

(e)  R.  R.  Snapp  and  T.  Xu,  “Estimating  the  Bayes  Risk  from  Sample  Data,”  in 
D.  S.  Touretzky,  M.  C.  Moser,  and  M.  E.  Hasselmo,  ed.,  Advances  in  Neu¬ 
ral  Information  Processing  Systems,  vol.  8,  Cambridge,  MA:  MIT  Press, 
1996,  pp.  232-238. 

(f)  A.  M.  Palau  and  R.  R.  Snapp,  “The  labeled  cell  classifier:  a  fast  approx¬ 
imation  to  fe  nearest  neighbors,”  in  A.  K.  Jain,  S.  Venkatesh,  and  B.  C. 
Lovell,  ed.,  Proceedings  of  the  14th  International  Conference  on  Pattern 
Recognition,  vol.  1,  IEEE  Computer  Society  Press:  Los  Alamitos,  CA, 
1998,  pp.  823-827. 

(g)  R.  R.  Snapp  and  S.  S.  Venkatesh,  “Asymptotic  expansions  of  the  fe- 
nearest  neighbor  risk,”  Annals  of  Statistics,  vol.  26,  no.  3,  pp.  850 
878,1998. 

(h)  R.  R.  Snapp  and  S.  S.  Venkatesh,  “Asymptotic  derivation  of  the  finite- 
sample  risk  of  the  fe  nearest  neighbor  classifier,”  Technical  Report  UVM- 
CS-1998-0101,  Department  of  Computer  Science,  University  of  Ver¬ 
mont,  1998  (40  pages) 
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6.2  Manuscripts 


(i)  A.  M.  Palau  and  R.  R.  Snapp,  “The  labeled  cell  classifier,”  (in  progress, 
to  be  submitted  to  IEEE  Trans.  Pattern  Anal  and  Mach.  Intel!.). 


7  Summary  and  Practical  Consequences 

For  the  image  analyst,  and  indeed  any  practitioner  of  pattern  recognition, 
the  art  of  pattern  recognition  has  been,  and  continues  to  be,  an  empirical  sci¬ 
ence.  Algorithms  are  evaluated  on  their  efficiency  and  accuracy  when  applied 
to  the  problems  of  interest.  Because  of  the  diverse  structure  of  practical  clas¬ 
sification  problems,  even  in  the  context  of  image  analysis,  it  seems  unlikely 
that  their  exists  a  unique  optimal  classification  algorithm  [8].  However,  for 
many  applications  the  k  nearest  neighbor  algorithm  serves  as  a  nearly  opti¬ 
mal  practical  pattern  classifier.  For  example,  it  is  the  most  popular  classifica¬ 
tion  algorithm  in  handwritten  document  analysis  [25,  34],  and  a  competitive 
benchmark  in  general  [3]. 

The  results  of  this  study  should  help  practitioners  in  every  field,  includ¬ 
ing  image  analysis,  make  better  use  of  the  k  nearest  neighbor  classifier.  First 
of  all,  the  asymptotic  analysis  described  by  Eqn.  (1)  (see  also  [31,  32]),  pro¬ 
vides  a  parametric  model  of  the  accuracy  of  this  classifier  in  terms  of  the 
reference  sample  size.  In  two  conference  articles  [27,  33]  (see  Figure  1)  we 
demonstrated  that  this  model  is  valid  in  the  context  of  pixel  classification 
in  multispectral  images.  Thus,  the  practitioner  can  use  Eqn.  (1),  with  the 
least  squares  technique  described  in  [27,  33],  to  predict  the  accuracy  of  the 
k  nearest  neighbor  classifier  for  a  range  of  sample  sizes.  This  information 
should  help  answer  the  question,  “How  large  a  reference  sample  should  I  use 
to  obtain  a  pattern  classifier  that  is  accurate  to  within  x%  of  the  asymptotic 
limit?” 

Our  study  also  demonstrates  analytically  how  the  accuracy  of  the  k  near¬ 
est  neighbor  classifier  can  be  enhanced  by  the  selection  of  an  appropriate 
metric,  or  distance  function.  We  have  shown  that  for  a  large  class  of  prob¬ 
lems,  the  choice  of  a  weighted  Euclidean  metric  is  the  optimal  global  Lp 
metric.  Future  research  based  upon  on  this  work,  may  yield  methods  for 
discovering  the  optimal  local  metric  directly  from  the  reference  data.  This 
will  allow  practitioners  to  design  more  accurate  nonparametric  pattern  clas¬ 
sifiers  for  practical  problems.  The  benefits  of  this  line  of  research  should 
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be  most  pronounced  for  classification  problems  in  high  dimensional  feature 
spaces  [22],  such  as  those  encountered  in  the  contexts  of  multispectral  and 
hyperspectral  image  analysis. 

The  main  intent  of  [27,  33]  was  to  demonstrated  how  this  model  can  be 
inverted  to  obtain  a  estimates  of  the  accuracy  of  the  Bayes  classifier  for  prac¬ 
tical  pattern  classification  problems.  This  knowledge  allows  the  practitioner 
to  compare  the  intrinsic  accuracy  of  competing  representations  of  a  given 
classification  problem.  The  question  “Which  spectral  bands  and  image  pro¬ 
cessing  filters  should  I  use  to  represent  patterns  for  identifying  objects  of 
class  x  in  environment  y?"  is  an  instance  of  the  problem  of  feature  selection, 
which  remains  the  most  important  (and  perhaps  the  most  difficult)  unsolved 
problem  in  the  field  of  pattern  recognition. 

The  labeled  cell  classifier,  described  above,  provides  an  accurate  approx¬ 
imation  to  the  k  nearest  neighbor  classifier  in  applications  where  the  classi¬ 
fication  time  is  critical.  As  such  situations  seem  likely  to  occur  in  defense 
applications,  this  algorithm  should  be  of  interest  to  the  Air  Force. 

Finally,  the  software  extensions  that  we  have  developed,  have  provided  a 
useful  bridge  between  our  theoretical  and  empirical  investigations,  allowing 
us  to  efficiently  construct  labeled  reference  samples  pixel  based  patterns 
from  multispectral  and  TIFF  images. 
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